Do not release a parameter if it will be reused within this threshold of parameters. Smaller values use less memory, but slow down training.